Simplifying the Complexity of Machine Translation

نویسندگان

  • Randall Sharp
  • Oliver Streiter
چکیده

A description of the CAT2 machine translation system is presented as an example of a model which stresses simplicity over complexity. The design of the formalism encompasses a minimum of formal constructs, yet is powerful enough to describe complex linguistic and translational phenomena. The paper presents an overview of the formalism, followed by examples of its usage in linguistic applications. A critical evaluation is presented, in which the authors discuss the shortcomings of the system and present the directions that are being taken to achieve a more realistic, and more simpli ed, model of machine translation. Le prototype d'un syst eme de traduction automatique CAT2 est present e comme exemple d'un syst eme qui met l'accent sur la notion de simplicit e. Les quelques bases formelles du formalisme sont expliqu ees et illustr ees par des applications linguistiques. Au moyen de quelques exemples, il est montr e comment un tel formalisme permet avec des moyens tr es simples de r ealiser des grammaires aptes a traiter des ph enom enes complexes et linguistiquement int eressants. Sont egalement esquiss es et motiv es les futurs d eveloppements du formalisme actuel. Simplifying the Complexity of Machine Translation Randall Sharp and Oliver Streiter IAI Martin-Luther-Stra e 14 D-6600 Saarbr ucken 3 Germany In developing any natural language processing (NLP) system, particularly for machine translation (MT) applications, it is crucial that the basic constructs of the system become, and remain, simple. Otherwise the complexity of the system becomes overwhelming (1) for the system developers, who have to maintain the system over its lifetime, (2) for the linguists and translators who maintain the grammars, lexicons, and translation dictionaries, and (3) for the end-users who actually make use of the system in some application. The notion of simplicity is often at odds with notions of power, expressiveness, e ciency, which one also obviously needs but which seem to promote complexity, and often, always unwittingly, metamorphoses into complicatedness, where the internal and external workings of the system are no longer well understood. This tension between simplicity and appropriate complexity underscores the design and development of an experimental MT system known as CAT2. CAT2, a transfer-based MT system, was rst developed in 1987 as a sideline to Eurotra, the massive MT project sponsored by the CEC in Luxembourg encompassing nine languages in twelve countries. In 1985 the CEC sponsored a rst design based on the ,T framework (Arnold et al. 1985,1986), in which a \rapid prototyping" approach was advocated. However, given the scope of the undertaking and the intense and varied use to which the prototype was subjected, its inadequacy as a working system soon became apparent. In 1987, the CEC cancelled this attempt and developed the Engineering Framework (Bech & Nygaard 1988) which has been in development up to the present time. Although some of the shortcomings of the original prototype were overcome, the \E-Framework" also introduced a number of new complexities into the framework (see Sharp (1991) for a review), to the point where the CEC has once again undertaken a new study to develop a third, even more powerful, prototype (Alshawi et al. 1991), scheduled for completion in 1994. In the meantime, a number of sideline alternatives have sprung up in order to continue the research into machine translation, among them the MiMo system (Arnold & Sadler 1990), CLG (Balari et al. 1990), and the CAT2 system (Sharp 1988). The CAT2 system, adhering to the original ,T methodology and abiding by the maxim of controlled simplicity, has continued to develop into a viable MT prototype in which current theories of computation, linguistics, computational linguistics, and translation are being tested, re ned and applied. To date, experimental translation systems have been developed for English, French, German, Spanish, Greek, Portuguese, Russian, Czech, and Japanese. In this article we will look at the nature of the CAT2 formalism, and try to re ect the notion of simplicity behind its design. The rst section describes the formalism, the second illustrates its application, and the third covers some of the intended extensions to the formalism which will further enhance and underline its inherent simplicity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Machine Translation System Based on a Monotone Decoder

In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...

متن کامل

Left-to-Right Tree-to-String Decoding with Prediction

Decoding algorithms for syntax based machine translation suffer from high computational complexity, a consequence of intersecting a language model with a context free grammar. Left-to-right decoding, which generates the target string in order, can improve decoding efficiency by simplifying the language model evaluation. This paper presents a novel left to right decoding algorithm for tree-to-st...

متن کامل

A Comparative Study of English-Persian Translation of Neural Google Translation

Many studies abroad have focused on neural machine translation and almost all concluded that this method was much closer to humanistic translation than machine translation. Therefore, this paper aimed at investigating whether neural machine translation was more acceptable in English-Persian translation in comparison with machine translation. Hence, two types of text were chosen to be translated...

متن کامل

Using Complexity to Simplify Knowledge Translation; Comment on “Using Complexity and Network Concepts to Inform Healthcare Knowledge Translation”

Putting health theories, research and knowledge into practice is a challenge referred to as the knowledge-toaction gap. Knowledge translation (KT), and its related concepts of knowledge mobilization, implementation science and research impact, emerged to mitigate this gap. While the social interaction view of KT has gained currency, scholars have not easily made a link between KT and the concep...

متن کامل

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

Exploring the effects of Sentence Simplification on Hindi to English Machine Translation System

Even though, a lot of research has already been done on Machine Translation, translating complex sentences has been a stumbling block in the process. To improve the performance of machine translation on complex sentences, simplifying the sentences becomes imperative. In this paper, we present a rule based approach to address this problem by simplifying complex sentences in Hindi into multiple s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002